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ABSTRACT 

Presented at a symposium on "The Structure of Concept. 
Attainment Abilities Project: Final Reijort and Critique," this paper 
provides the methodological aspects of the project. The discussion 
centers around a "Guide to the Multivariate Methods," which is 
provided in the paper. The basic guide-posts are the types of 
analysis and the types of content. The latter include concept 
attainment by subject-matter field or combined, and cognitive 
abilities, or both. The three factor analytic techniques used to 
examine the data and obtain derived factor solutions. The next major 
phase of the analysis, the construction of 56 cognitive tests for the 
1970 study, the factor analysis of these tests, and the reduced 
battery for use in the 1971 testing, is briefly noted. The roost 
important part of the project — the study of relationships between the 
concept attainment measures and the cognitive abilities measures — is 
then described. The interbattery approach of Tucker, used to 
determine if there were factors common to the battery of cognitive 
abilities tests and the battery of concept attainment measures, is 
discussed. (DE) 
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An Evaluation of the Multivariate Methodology of the Project 

Harry H, Harman 



While my evaluation is limited strictly to the methodological aspects 
of the project, I do want to admit my acquaintance with and interest in the 
more general aspects of the project. Further, my knowledge and evaluation 
of tAe methodology is not restricted to the foregoing papers. Not only did 
I receive the 500 page draft copy of the monograph by the Harrises [7], but 
I also had the pleasure of being chairman and discussant two years ago of an 
AERA Symposium in which the Harrises' paper on the classification of cognitive 
abilities [6] gave an early report on that part of the project (Chapter IV). 

Before I home in on the multivariate methodology let me give a very 
brief overview of my perception of the total project. In effect a type of 
systems analysis was made of four subject-matter fields, identifying the con- 
cepts that set these fields apart (Chapter II). Then measures (tests) were 
developed for the determination of the degree of attainment of these concepts, 
both in terms of "content" (i.e., the concept itself) and in terms of the 
"level of understanding" of the concept (designated "tasks"). In order to get 
some indication of the potential predictability of this type of achievement 
(i.e., concept attainment), the investigators explored several classification 
systems for cognitive abilities in building their own test battery. A variety 
of analyses were performed in the separate studies of the concept attainment 
measures (Chapter III) and of the cognitive abilities measures (Chapter IV), 
with the culminating analyses involved in studying the relationships between 
the two sets of variables (Chapcer V). 

Now, we turn our attention to an examination of the data summary and 
analysis. It would be the understatement of the day to say that this ambitious 
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undertaking yielded a great wealth of results. The goals of the study wers 
attained, in large measure, through the very effective planning and guidance 
of the analysis provided by Mr. Harris. 

The first problem noted in his paper is not of the usual multivariate 
type. Basically, the question is how best to use the item data on the completely 
crossed design of concepts by tasks in constructing the tests of concept attain- 
ment. I checked with item-analysis specialists at ETS. When it became evident 
that they had not had previous experience with this type of data and did not 
have any immediate recommendations for its resolution, I could only agree with 
Harris on his approach — separate scores by rows and columns and proceed with 
traditional item analysis. But I also concur with him that this area merits 
further exploration. 

The remainder of my discussion will be concerned with the more familiar 
multivariate problems of the project. I hadn't proceeded very far into this 
area before I saw the need for a map of this vast terrain. I share this with 
you in the form of a "Guide to the Multivariate Methods" in the Handout . The 
basic guide-posts are the types of analysis in the rows, and the types of con- 
tent in the columns. The latter include concept attainment by subject-matter 
field or combined, and cognitive abilities, or both. A particular locale is 
found according to the analysis-and-content coordinates and is marked by a 
symbol that denotes concepts, tasks, or cognitive abilities, or their combina- 
tions, with the table numbers from the draft monograph where the data can be 
found. Also shown on the map are lines Indicating the course of flow among the 
various forms of analysis. 

The investigators rely heavily on factor analytic techniques in the 
examination of the voluminous data — matching my personal predilection. A whole 
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range of problems is attacked with factor analysis, from exploratory efforts 
aimed at data reduction and determination of structure to the study of 
relationships between batteries of tests. The factor analysis i^ork is sophis- 
ticated and pertinent to the objectives. 

They always employ for the direct (or initial) factorization of a corre- 
lation matrix, a method due to Kaiser and Caffrey (Alpha); a method developed 
by Harris (referred to as R-S ); and a maximum likelihood method of Joreskog. 
The solutions yielded by all three of these methods have the property of being 
independent of the original scale or metric of the variables. That is why 
Harris selected these procedures, and it is a perfectly sound basis. 

Both Alpha and Harris' method employ as a point of departure Rao's 
canonical factor analysis [10] in which the canonical correlations are determined 
between the observed variables and estimated factor scores. But they depart from 
Rao's statistical criterion for the number of common factors; Kaiser and Caffrey 
use the notion of psychometric generalizability while Harris uses the squared 
multiple correlation for estimating communality, in the sense of Guttman's best 
lower bound, for determining the number of common factors. Furthermore, in order 
to assure that the solutions are scale-free, in Harris' method the variables are 
rescaled in the metric of the unique parts while in Alpha they are rescaled in 
the metric of the common parts, and upon conclusion of the factoring the results 
are transformed back into the original metric of the variables. 

Upon rotation of the initial solutions, Harris noted that the derived 
solutions based on Alpha factor analysis usually yielded only one i.actor while 
his method produced the largest number (with maximum likelihood producing an 
intermediate numLer). He also surmised that solutions with several factors 
probably were overdiff erentiating because th • derived oblique factors tended to 



correlate highly. The inclination of the proje.it staff to follow a conser- 
vative course in interpreting a minimal number cf factors agrees with my per- 
sonal bias to err on the side of under-factoring — a willingness to have the 
factor analysis provide a "first approximation" model for the empirical data. 

Before continuing with a discussion of derived solutions one might ask, 
why three separate methods? Several different factorizations are required for 
the application of the Harris' strategy for factor interpretation. That 
strategy [5] calls for derived solutions (both orthogonal and oblique) based 
on several initial factorizations, and from the results to accept as "the 
important substantive findings those factors that are robust with respect to 
method," i.e., factors that tend to include the same variables across method. 
These they call "comparable common factors" or CCF's, It should be clear that 
their concern is with potential idiosyncrasies of particular methods that may 
lead to unwarranted substantive conclusions. The strategy does not disclose 
the effect of chance errors in the data. It occurred to me that if they were 
to become concerned with sampling problems, they might employ a procedure 
recently applied to factor analysis by Pennell [9] that has been advanced by 
Tukey [13] as the "jack-knife," named for the boy scout's rough-and-ready 
general purpose instrument. 

Returning to their use of derived factor solutions, we note from the Guide 
that they rely on three types: one orthogonal and two oblique. There is no 
question but that varimax is generally accepted as the preferred orthogonal 
solution. For oblique solutions, the issue is not so clear. The methods used 
were developed by Harris and Kaiser [4] in their 1964 paper, "Oblique factor 
analytic solutions by orthogonal transformations." Two of these procedures 
were used in the project, namely: (1) a method in which the reproduced 
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correlation matrix can be represented by a set of independent (but correlated) 
clusters; and (2) a method in which the minor product moment of the factor 
pattern matrix is approximately proportional to the matrix of factor correlations. 
These two oblique methods are designated "Independent cluster" and "A'A propor- 
tional to L," respectively. These methods were compared with several other ^ 
techniques for oblique transformations by Hakstian [1], }AiO demonstrated in 
1971 (and has provided further evidence at an AERA program this morning [2]) 
that they produce solutions best exemplifying simple structure. 

The "independent cluster" factor solution can be expected to fit only 
simple data ^ile the "A'A proportional to L" transformation can fit more cca- 
pley data. Hence, it is not surprising that they found they had to reject tb« 
hypothesis of independent clusters after obtaining such solutions. The more 
complex oblique solutions were obtained from each of the three initial factori- 
zations and interpreted by means of the comparable common factors strategy, as 
indicated by the flow lines into the CCF boxes in the Guide. 

In studying the concepts and tasks for the four subject-matter fields, 
separately , the numbers of factors in the several solutions varied considerably, 
making the CCF strategy inappropriate. On the other hand, when they performed 
factor analyses of the concept measures and of the task measures for the four 
fields combined they found their 1 r terpretation strategy to be very effective. 
As noted by Mrs. Harris, this led .:o the overall conclusion of a comparable 
common factor representing each of the subject fields (with some slight overlap) 
for the concepts and for the tasks, as shown at the bottom of Table 1 in her 
Handout. Apparently the analysis required data revealing the contrasts among the 
four subject-matter fields to identify factors with fields an4 distinguish common 
from unique variance clearly. 
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For the analysis of concept attainment, there remains the important 
question of possible corxept-task interaction in the crossed design of 
their study. This was the second problem to which Mr. Harris referred in 
his paper. It seems quite reasonable that they turned to Tucker's three- 
mode factor analysis [12] for answers to this problem. This procedure was 
applied to the 1970 data for each of the four subject-matter fields, and 
aside from making some necessary compromises, they found that there were no 
important concept-task interactions. Having reached this conclusion for the 
1970 data, they did not repeat the analysis for the 1971 data. While they 
appeared pleased with the potentialities of three-mode factor analysis they 
also found its current computer programs limited in several respects. 

The next major phase of analysis in the project deals with the dimensions 
of a battery of cognitive abilities tests, as shown in the two right-most 
columns of the Guide. First they reviewed the Guilford, Guttnian, and 
Thurstone schemata for classifying cognitive abilities. This led them to 
construct 56 tests for the 1970 study. Upon factor analyzing these, and using 
their comparable common factors strategy, which worked quite well in this case, 
they determined the factors that they wanted preserved in a reduced battery for 
wider use in the 1971 testing (summarized in Table 2 of Mrs. Harris' Handout). 

As Harris noted in his paper, the problem of selecting the subset of tests, 
indicated by the feedback loop at the bottom of the last column in the Guide, 
is not a trivial one. Their approach relied on the coefficients of the oblique 
factors (i.e., pattern matrix) in deciding whether to select a particular 
variable as a measure of a factor. But these regression coefficients "go the 
other way." They might have obtained the regression estimate of an oblique 
factor on the set of variables and thereby have the regression coefficients "go 
the right way" to facilitate the selection of important tests for a factor. 



All the preceding analyses, in a sense, were preparatory for the most 
important part of the project — the study of relationships between the concept 
attainment measures and the cognitive abilities measures • Four essentially 
different attacks on this problem were considered, namely; 

(1) Conventional factor analysis of the two batteries, simultaneously, 
without making a distinction between them; 

(2) On some basis, designating one battery as "dependent" and projecting 
its variables into the common factor space of the other or "inde- 
pendent" battery; 

(3) Canonical correlation and canonical variates approach; 

(4) Interbattery factor analysis. 

Although they performed factor analyses of the concept attainment measures and 
cognitive abilities measures treated as a single battery, they found the results 
to be less valuable than those obtained from the interbattery approach of 
Tucker [11]. 

While the interbattery procedure was designed originally to determine the 
stability of factors in two batteries of tests (assumed to depend on the same 
factors), it was used in this project to determine if there were factors common 
to the battery of cognitive abilities tests and the battery of concept attain- 
ment measures (without prior design for these two types of measures to depend 
on the same factors). The specific modifications they made was in using a 
different statistical test for the number of factors and by employing the inter- 
battery factor matrices for both test batteries in getting an orthogonal and 
the two types of oblique derived solutions. 

The number of interbattery factors was determined, in part, from the number 
of significant canonical correlations. Another use of canonical variate analysis 



was in getting squared multiple correlations of the concept attainment 
measures as estimated from the 31 cognitive abilities tests. In addition, 
they computed squared multiple correlations (i.e., communalities) of the 
concept attainment measures as estimated from the interbattery factors. 

As noted above, four different attitudes about the two batteries were 
conbidered for the study of relationships between them. In ruling out one 
of them, they asserted that an approach involving canonical correlation and 
canonical variates implies a component-type solution. But that need not be 
the case. In a method developed by Bary Wingersky [14] at ETS, such a 
procedure is employed in getting a conventional factor solution. The method 
leads to complete and consistent solutions by satisfying the following two 
conditions (which are not exclusively satisfied by ccmponents solutions): 
0 ) Orthogonal factor scores are derived from the observed data and 

the factor pattern by a least-squares fit of the original data to 

its reconstruction from the factor model; 
(2) The regression of the original data on these factor scores reproduces 

the factor pattern. 

It should be noted, of course, that while the factor scores are implicit in the 

theoretical development, they are not actually computed for the cases in the 

sample. To avoid cotuponents solutions, each set of factor scores is the canonical 

variate associated with a canonical correlation thac explains the most variance 

over all the data. Thus canonical correlations guide decisions concerning the 

number of factors to select, but in terms of reliability (as in Alpha factor 
analysis) rather than in a statistical sampling sense. 

In the normal application of this method, a partition of the data is 

selected for the calculation of canonical correlations that will produce the 
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largest estimate of reliabitity. I thought of adapting this method to the 
study of the relationships between the two batteries of the project by 
a priori assignment of them to the two partitions. The specific example used 
was data for the 1971 boys; the battery of 31 cognitive abilities tests was 
put in one partition and the battery of 48 combined subject-matter tasks was 
put in the other partition. Two and three factor solutions were obtained for 
comparison with the Harrises* results, although the new interbattery method 
pointed to possibly four reliable factors for the two batteries. Derived 
orthogonal (varimax) and oblique (A*A proportional to L) solutions were also 
determined for the two factor case since that is the number of interbattery 
factors exhibited in Table 8 of Mrs. Harris* Handout. No attempt will be made 
here to compare the results of the two approaches to interbattery factors. 
Suffice it to say that the squared multiple correlations for estimating the 
48 task measures by the new method exceeded those of the original method by 
an average of only .03. 

In conclusion, I want to thank the Wisconsin Research and Development 
Center for Cognitive Learning for inviting me to comment upon the methodology 
of this project that has, for all its complexity, been so well organized and 
caiefully executed. Beyond its important substantive contribution to our under- 
standing of how cognitive abilities combine in subject-matter concept attain- 
ment, the design of this project, its sensitive and sensible adaptation of 
methodology to objective, is a paradigm for what can be achieved with multi- 
variate research techniques. 
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AEiM Symoosium , February 26, 1973, New Orleans 

GUIDE TO MULTIVARIATE METHOOOLOl 

All entries tn the boxes refer to table numbers (of the text ond oppendices of ttie draf 

Concept Attainment (Achievement) 



Type of Analysis 



By subject-matter Fields^ 



Correlation 



es 



Initial factor solutions 
Alpha 

Harris R-S^ 
Maximum likelihood 



Spearman single factor 



Derived oblique factor solutions 
Independent cluster 
(for simple data ) 

A'A proportional to L 
(for complex data) 

Comparable common factors (CCF) 

Squared multiple correlations 

(from canonical variate analysis) 



Interbattery factor analysis 



Three mode factor analysis 

Derived orthogonal factor solution 
Varimax (normalized) 




Combined Fields (I 



A. 33-44 















¥k H * 



B.33, 36 



I 



B.34. 37 



B. 


35,38 






m 


80-82 
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* For both 1970 and *97l unless a single year is indicated in parentheses. 

Selection of 31 tests ftr 1971 Study from 56 tests in 1970 Study. 

ikfkw Except for the number of factors, the initial factor solutions were 
not given in the draft monograph. 
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JIVARIATE METHODOLOGY OF THE PROJECT 

ext and oppendicet of the drofl monogrcph ) where the results of the analysis will be found. 

( Achievement) Cognitive Abilities (Aptitudes) 
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Combined Fields (1971 only) I 
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1970 



A. 33-44 
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B.33 , 36 
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B.34. 37 



8. 35,38 



m. 80-82 



< B. 40,43> 



B.4I,44> 



<3n.83-8C > 
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n n a 



G. I -2 >~ 



* < G.3-4 : { 



G~r2>- 



G. 3-4 ^ 
G~r2>J 



t»< G. 3-4^ 



p- C D. 3-4 ( D. 1-2 ^ 



* * a 



C E. I9,22> - 



Ce. 20-23^ 

C E. 21.24 i 
Clg.20,21 . 



TL. 2 



2.3-4 



Y.5-6 ',6.5-6 



> 
>- 



* * * 
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CE.I-3,IO-|g) 
Ce .7-9,16-18'^ 
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Concepts 
< C ^ Tasks 

^ Cognitive abilities 



C 3 Concepts and tasks 



~^ Concepts and cognitive abilitits 



< 5 Tasks arid cognitive obilities 



